Spatial models for areal data

Intro to lattice and irregular lattice

Example 1 - Besag and Besagproper models

inla.doc("besag")

In this example we use the North Carolina Sudden Infant Death Syndrome dataset.

data(nc.sids)
head(nc.sids)
##             CNTY.ID BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79 east north      x
## Ashe           1825  1091     1      10  1364     0      19  164   176 -81.67
## Alleghany      1827   487     0      10   542     3      12  183   182 -50.06
## Surry          1828  3188     5     208  3616     6     260  204   174 -16.14
## Currituck      1831   508     1     123   830     2     145  461   182 406.01
## Northampton    1832  1421     9    1066  1606     3    1197  385   176 281.10
## Hertford       1833  1452     7     954  1838     5    1237  411   176 323.77
##                   y       lon      lat L.id M.id
## Ashe        4052.29 -81.48594 36.43940    1    2
## Alleghany   4059.70 -81.14061 36.52443    1    2
## Surry       4043.76 -80.75312 36.40033    1    2
## Currituck   4035.10 -76.04892 36.45655    1    4
## Northampton 4029.75 -77.44057 36.38799    1    4
## Hertford    4028.10 -76.96474 36.38189    1    4
hist(nc.sids$SID74)

summary(nc.sids)
##     CNTY.ID         BIR74           SID74          NWBIR74      
##  Min.   :1825   Min.   :  248   Min.   : 0.00   Min.   :   1.0  
##  1st Qu.:1902   1st Qu.: 1077   1st Qu.: 2.00   1st Qu.: 190.0  
##  Median :1982   Median : 2180   Median : 4.00   Median : 697.5  
##  Mean   :1986   Mean   : 3300   Mean   : 6.67   Mean   :1051.0  
##  3rd Qu.:2067   3rd Qu.: 3936   3rd Qu.: 8.25   3rd Qu.:1168.5  
##  Max.   :2241   Max.   :21588   Max.   :44.00   Max.   :8027.0  
##      BIR79           SID79          NWBIR79             east      
##  Min.   :  319   Min.   : 0.00   Min.   :    3.0   Min.   : 19.0  
##  1st Qu.: 1336   1st Qu.: 2.00   1st Qu.:  250.5   1st Qu.:178.8  
##  Median : 2636   Median : 5.00   Median :  874.5   Median :285.0  
##  Mean   : 4224   Mean   : 8.36   Mean   : 1352.8   Mean   :271.3  
##  3rd Qu.: 4889   3rd Qu.:10.25   3rd Qu.: 1406.8   3rd Qu.:361.2  
##  Max.   :30757   Max.   :57.00   Max.   :11631.0   Max.   :482.0  
##      north             x                 y             lon        
##  Min.   :  6.0   Min.   :-328.04   Min.   :3757   Min.   :-84.08  
##  1st Qu.: 97.0   1st Qu.: -60.55   1st Qu.:3920   1st Qu.:-81.20  
##  Median :125.5   Median : 114.38   Median :3963   Median :-79.26  
##  Mean   :122.1   Mean   :  91.46   Mean   :3953   Mean   :-79.51  
##  3rd Qu.:151.5   3rd Qu.: 240.03   3rd Qu.:4000   3rd Qu.:-77.87  
##  Max.   :182.0   Max.   : 439.65   Max.   :4060   Max.   :-75.67  
##       lat             L.id           M.id     
##  Min.   :33.92   Min.   :1.00   Min.   :1.00  
##  1st Qu.:35.26   1st Qu.:1.00   1st Qu.:2.00  
##  Median :35.68   Median :2.00   Median :3.00  
##  Mean   :35.62   Mean   :2.12   Mean   :2.67  
##  3rd Qu.:36.05   3rd Qu.:3.00   3rd Qu.:3.25  
##  Max.   :36.52   Max.   :4.00   Max.   :4.00
nc.sids <- st_read(system.file("shapes/sids.shp", package="spData"))
## Reading layer `sids' from data source 
##   `/Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library/spData/shapes/sids.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 22 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## CRS:           NA

We see that the number of deaths is a count variable and therefore propose a Poisson regression model, i.e.  \[y\sim Poi(E\theta)\] with \[\theta = \exp(\eta)\] and \(E\) is the expected count. In this way, \(\theta\) indicates the risk. We need to calculate the expected count for each observation,

p_hat <- sum(nc.sids$SID74) / sum(nc.sids$BIR74)
nc.sids$E74 <- p_hat * nc.sids$BIR74

We use the NWBIR as a covariate to investigate if it has any influence on the risk of SIDS. We consider the proportion instead of the count,

nc.sids$nwp_hat74 <- nc.sids$NWBIR74 / nc.sids$BIR74

Adding a random intercept adds an independent effect, although the counties are probably not independent in terms of socio-economic diversity. So we should rather include a structured random effect such that some “intercepts” are correlated with each other, conveying the dependence in space. Since we have space divided into discrete non-overlapping parts, we can use a Besag or BYM model.

#Add besag for counties - a spatial model
#Graph - neighbourhood structure
nc.adj <- poly2nb(nc.sids)
B.nc <- nb2mat(nc.adj, style = "B")

plot(nc.sids$geometry, border="grey")
plot(nc.adj, coords = nc.sids$geometry,  add = TRUE, col = "blue")

#Add covariate to spatial dataset    
nc.sids$CNTY_ID2 <- 1:(length(nc.sids$CNTY_ID))

#Fit the spatial model
result3_sids <- inla(SID74 ~ nwp_hat74 + f(CNTY_ID2, model = "besagproper", graph = B.nc), 
                     data = as.data.frame(nc.sids), 
                     family = "poisson",
                     control.predictor = list(compute = TRUE),
                     E = E74)

The posterior summary is

summary(result3_sids)
## Time used:
##     Pre = 4.59, Running = 1.08, Post = 0.101, Total = 5.77 
## Fixed effects:
##               mean    sd 0.025quant 0.5quant 0.975quant   mode kld
## (Intercept) -0.652 0.127     -0.910   -0.650     -0.405 -0.650   0
## nwp_hat74    1.883 0.283      1.332    1.881      2.449  1.881   0
## 
## Random effects:
##   Name     Model
##     CNTY_ID2 Proper version of Besags ICAR model
## 
## Model hyperparameters:
##                        mean   sd 0.025quant 0.5quant 0.975quant  mode
## Precision for CNTY_ID2 7.46 6.90      1.502     5.47      25.79 3.248
## Diagonal for CNTY_ID2  1.08 1.16      0.097     0.73       4.16 0.268
## 
## Marginal log-Likelihood:  -227.11 
##  is computed 
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

The two extra hyperparameters from the besagproper effect are the marginal standard deviation, \(\sigma\) and the properness parameter, \(d\), and their marginal posteriors are

res3_sd_bym <- inla.tmarginal(function(x) sqrt(1/x), result3_sids$marginals.hyperpar$`Precision for CNTY_ID2`)
plot(res3_sd_bym, type = "l", xlab = expression(sigma), ylab = expression(paste("f(", sigma ,"|y)")))

plot(result3_sids$marginals.hyperpar$`Diagonal for CNTY_ID2`, type = "l", xlab = "d", ylab = "f(d|y)")

We can again visually see the fit of the predictions from the model to the data,

nc.sids$fit_besag <- result3_sids$summary.fitted.values$mean*nc.sids$E74
plot(nc.sids["SID74"], breaks = seq(0,50, by = 5), main = "Observed counts")

plot(nc.sids["fit_besag"], breaks = seq(0,50, by = 5), main = "Predicted counts")

plot(nc.sids$SID74, ylab = "SIDS Counts")
points(nc.sids$E74*result3_sids$summary.fitted.values[,1], col = "purple", pch = 8, cex = 0.8)

When we compare the fit between models we can look at the marginal log-likelihoods, DIC or WAIC.

result1_sids <- inla(SID74 ~ nwp_hat74, data = nc.sids, 
                     family = "poisson",
                     control.compute = list(dic = TRUE, waic = TRUE),
                     control.predictor = list(compute = TRUE),
                     E = E74)

result2_sids <- inla(SID74 ~ nwp_hat74 + f(CNTY_ID2, model = "iid"), 
                     data = nc.sids, 
                     family = "poisson",
                     control.compute = list(dic = TRUE, waic = TRUE),
                     control.predictor = list(compute = TRUE),
                     E = E74)

result3_sids <- inla(SID74 ~ nwp_hat74 + f(CNTY_ID2, model = "besagproper", graph = B.nc), 
                     data = as.data.frame(nc.sids), 
                     family = "poisson",
                     control.compute = list(dic = TRUE, waic = TRUE),
                     control.predictor = list(compute = TRUE),
                     E = E74)

print(matrix(rbind(cbind(result1_sids$mlik[1], result1_sids$dic[1], result1_sids$waic[1]),
            cbind(result2_sids$mlik[1], result2_sids$dic[1], result2_sids$waic[1]),
            cbind(result3_sids$mlik[1], result3_sids$dic[1], result3_sids$waic[1])), 
            nrow = 3, ncol = 3,
            dimnames = list(c("Model 1", "Model 2 - iid", "Model 3 - Besag"),
                            c("Marginal log-likelihood", "DIC", "WAIC")
                            )))
##                 Marginal log-likelihood DIC      WAIC    
## Model 1         -226.1261               441.6218 442.7613
## Model 2 - iid   -227.215                431.0399 437.1806
## Model 3 - Besag -227.2152               431.8535 437.6031

Let’s visualise the fitted values of all the models.

nc.sids$fit_besag <- result3_sids$summary.fitted.values$mean*nc.sids$E74
nc.sids$fit_fixed <- result1_sids$summary.fitted.values$mean*nc.sids$E74
nc.sids$fit_iid <- result2_sids$summary.fitted.values$mean*nc.sids$E74
plot(nc.sids["SID74"], breaks = seq(0,50, by = 5), main = "Observed counts")

plot(nc.sids["fit_fixed"], breaks = seq(0,50, by = 5), main = "Predicted counts - fixed")

plot(nc.sids["fit_iid"], breaks = seq(0,50, by = 5), main = "Predicted counts - iid")

plot(nc.sids["fit_besag"], breaks = seq(0,50, by = 5), main = "Predicted counts - besag")

plot(nc.sids$SID74, ylab = "SIDS Counts")
points(nc.sids$E74*result1_sids$summary.fitted.values[,1], col = "blue", pch = 10)
points(nc.sids$E74*result2_sids$summary.fitted.values[,1], col = "red", pch = 19, cex = 0.5)
points(nc.sids$E74*result3_sids$summary.fitted.values[,1], col = "purple", pch = 8, cex = 0.8)

Example 2 - BYM and BYM2 models

inla.doc("bym")
inla.doc("bym2")

Let’s revisit the NC SIDS dataset example.
Now we add a BYM effect for the different counties,

result4_sids <- inla(SID74 ~ nwp_hat74 + f(CNTY_ID2, model = "bym2", graph = B.nc), 
                     data = as.data.frame(nc.sids), 
                     family = "poisson",
                     control.compute = list(dic = TRUE, waic = TRUE),
                     control.predictor = list(compute = TRUE),
                     E = E74)

The posterior summary is

summary(result4_sids)
## Time used:
##     Pre = 41.1, Running = 1.07, Post = 0.0951, Total = 42.2 
## Fixed effects:
##               mean    sd 0.025quant 0.5quant 0.975quant   mode kld
## (Intercept) -0.676 0.122     -0.922   -0.674     -0.441 -0.674   0
## nwp_hat74    1.955 0.317      1.347    1.949      2.596  1.949   0
## 
## Random effects:
##   Name     Model
##     CNTY_ID2 BYM2 model
## 
## Model hyperparameters:
##                          mean    sd 0.025quant 0.5quant 0.975quant   mode
## Precision for CNTY_ID2 18.109 9.536      6.742   15.867      42.96 12.346
## Phi for CNTY_ID2        0.276 0.226      0.012    0.212       0.81  0.024
## 
## Deviance Information Criterion (DIC) ...............: 427.64
## Deviance Information Criterion (DIC, saturated) ....: 122.21
## Effective number of parameters .....................: 25.54
## 
## Watanabe-Akaike information criterion (WAIC) ...: 429.71
## Effective number of parameters .................: 22.31
## 
## Marginal log-Likelihood:  -170.41 
##  is computed 
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

The two hyperparameters from the BYM effect are the marginal standard deviation, \(\sigma\) and the weight parameter, \(\phi\), and their marginal posteriors are

res4_sd_bym <- inla.tmarginal(function(x) sqrt(1/x), result4_sids$marginals.hyperpar$`Precision for CNTY_ID2`)
plot(res4_sd_bym, type = "l", xlab = expression(sigma), ylab = expression(paste("f(", sigma ,"|y)")))

plot(result4_sids$marginals.hyperpar$`Phi for CNTY_ID2`, type = "l", xlab = expression(phi), ylab = expression(paste("f(", phi ,"|y)")))

We can again visually see the fit of the predictions from the model to the data,

nc.sids$fit_bym <- result4_sids$summary.fitted.values$mean*nc.sids$E74
plot(nc.sids[c("SID74", "fit_fixed", "fit_iid", "fit_besag", "fit_bym")],
       breaks = seq(0, 50, by = 5),
     main = "Observed and predicted counts")

plot(nc.sids$SID74, ylab = "SIDS Counts")
points(nc.sids$E74*result1_sids$summary.fitted.values[,1], col = "blue", pch = 10)
points(nc.sids$E74*result2_sids$summary.fitted.values[,1], col = "red", pch = 19, cex = 0.5)
points(nc.sids$E74*result3_sids$summary.fitted.values[,1], col = "purple", pch = 8, cex = 0.8)
points(nc.sids$E74*result4_sids$summary.fitted.values[,1], col = "darkgreen", pch = 2, cex = 0.8)

We can visualise the spatial and specific effects as well.

nc.sids$bym <- result4_sids$summary.random$CNTY_ID2[1:max(nc.sids$CNTY_ID2),"mean"]
plot(nc.sids[c("bym")],
       breaks = seq(-0.4,0.4, by = 0.05),
     main = "BYM component")

nc.sids$bym_besag <- result4_sids$summary.random$CNTY_ID2[1:max(nc.sids$CNTY_ID2)+max(nc.sids$CNTY_ID2),"mean"]
plot(nc.sids[c("bym_besag")],
       breaks = seq(-1,1, by = 0.1),
     main = "Besag component")

nc.sids$bym_iid <- (result4_sids$summary.random$CNTY_ID2[1:max(nc.sids$CNTY_ID2),"mean"]*
  sqrt(result4_sids$summary.hyperpar["Precision for CNTY_ID2", "mean"]) -
  result4_sids$summary.random$CNTY_ID2[1:max(nc.sids$CNTY_ID2)+max(nc.sids$CNTY_ID2),"mean"]*
  sqrt(result4_sids$summary.hyperpar["Phi for CNTY_ID2", "mean"]))/
(1-sqrt(result4_sids$summary.hyperpar["Phi for CNTY_ID2", "mean"]))

plot(nc.sids[c("bym_iid")],
       breaks = seq(-3,3, by = 0.1),
     main = "IID component")

When we compare the fit between models we can look at the marginal log-likelihoods, DIC or WAIC.

print(matrix(rbind(cbind(result1_sids$mlik[1], result1_sids$dic[1], result1_sids$waic[1]),
            cbind(result2_sids$mlik[1], result2_sids$dic[1], result2_sids$waic[1]),
            cbind(result3_sids$mlik[1], result3_sids$dic[1], result3_sids$waic[1]),
            cbind(result4_sids$mlik[1], result4_sids$dic[1], result4_sids$waic[1])), 
            nrow = 4, ncol = 3,
            dimnames = list(c("Model 1", "Model 2 - iid", "Model 3 - Besag", "Model 4 - BYM"),
                            c("Marginal log-likelihood", "DIC", "WAIC")
                            )))
##                 Marginal log-likelihood DIC      WAIC    
## Model 1         -226.1261               441.6218 442.7613
## Model 2 - iid   -227.215                431.0399 437.1806
## Model 3 - Besag -227.2152               431.8535 437.6031
## Model 4 - BYM   -170.6617               427.6361 429.7086

We can also do cross-validation as a means of measuring prediction performance (see https://arxiv.org/pdf/2210.04482).

result1_sids <- inla(SID74 ~ nwp_hat74, data = nc.sids, 
                     family = "poisson",
                     control.compute = list(dic = TRUE, waic = TRUE,
                                            cpo = TRUE, return.marginals = FALSE, control.gcpo = list(enable = TRUE,num.level.sets = 3)),
                     control.predictor = list(compute = TRUE),
                     E = E74)

result2_sids <- inla(SID74 ~ nwp_hat74 + f(CNTY_ID2, model = "iid"), 
                     data = nc.sids, 
                     family = "poisson",
                     control.compute = list(dic = TRUE, waic = TRUE,
                                            cpo = TRUE, return.marginals = FALSE, control.gcpo = list(enable = TRUE,num.level.sets = 3)),
                     control.predictor = list(compute = TRUE),
                     E = E74)

result3_sids <- inla(SID74 ~ nwp_hat74 + f(CNTY_ID2, model = "besagproper", graph = B.nc), 
                     data = as.data.frame(nc.sids), 
                     family = "poisson",
                     control.compute = list(dic = TRUE, waic = TRUE,
                                            cpo = TRUE, return.marginals = FALSE, control.gcpo = list(enable = TRUE,num.level.sets = 3)),
                     control.predictor = list(compute = TRUE),
                     E = E74)

result4_sids <- inla(SID74 ~ nwp_hat74 + f(CNTY_ID2, model = "bym2", graph = B.nc), 
                     data = as.data.frame(nc.sids), 
                     family = "poisson",
                     control.compute = list(dic = TRUE, waic = TRUE,
                                            po = TRUE, return.marginals = FALSE, control.gcpo = list(enable = TRUE,num.level.sets = 3)),
                     control.predictor = list(compute = TRUE),
                     E = E74)

result1_sids$gcpo$groups[[1]]
## $idx
## [1]  1 38 78
## 
## $corr
## [1] 1.0000000 0.9999960 0.9999973
result2_sids$gcpo$groups[[1]]
## $idx
## [1]  1 22 32 38 78 90
## 
## $corr
## [1] 1.0000000 0.1693715 0.1693715 0.1663808 0.1663808 0.1693715
result3_sids$gcpo$groups[[1]]
## $idx
## [1]  1  2 18
## 
## $corr
## [1] 1.0000000 0.3897119 0.3726388
result4_sids$gcpo$groups[[1]]
## $idx
## [1]  1  2 19
## 
## $corr
## [1] 1.0000000 0.2199047 0.2022050

The results for the model selection and validation are:

print(matrix(rbind(cbind(result1_sids$mlik[1], result1_sids$dic[1], result1_sids$waic[1], mean(result1_sids$gcpo$gcpo)),
            cbind(result2_sids$mlik[1], result2_sids$dic[1], result2_sids$waic[1], mean(result2_sids$gcpo$gcpo)),
            cbind(result3_sids$mlik[1], result3_sids$dic[1], result3_sids$waic[1], mean(result3_sids$gcpo$gcpo)),
            cbind(result4_sids$mlik[1], result4_sids$dic[1], result4_sids$waic[1], mean(result4_sids$gcpo$gcpo))), 
            nrow = 4, ncol = 4,
            dimnames = list(c("Model 1", "Model 2 - iid", "Model 3 - Besag", "Model 4 - BYM"),
                            c("Marginal log-likelihood", "DIC", "WAIC", "G-CV score")
                            )))
##                 Marginal log-likelihood DIC      WAIC     G-CV score
## Model 1         -226.1261               441.6218 442.7613 0.1829267 
## Model 2 - iid   -227.215                431.0399 437.1806 0.1769652 
## Model 3 - Besag -227.2152               431.8534 437.6031 0.1783804 
## Model 4 - BYM   -170.6617               427.6361 429.7086 0.1756431

Example 3 - Joint disease mapping models (using Besag2 model)

We can do multivariate disease mapping when we take into account the effect of the presence or absence of other diseases simultaneously. These are treated as endogenous variables. \[\begin{eqnarray*} y_{i 1}|\lambda_{i1} &\sim& \text { Poisson }\left(E_{i 1} \lambda_{i 1}\right)\\ y_{i 2}|\lambda_{i2} &\sim& \text { Poisson }\left(E_{i 2} \lambda_{i 2}\right) \\ \log (\lambda_{i 1}) &=& m_{1} + \sum_{f = 1}^{F_1} \beta_f X_{i f} + \sum_{r=1}^{R_1} \rho^r(u_{i r}) + b_{i 1} + S_{i}\\ \log (\lambda_{i 2}) &=& m_{2} + \sum_{f = 1}^{F_2} \gamma_f Z_{i f} + \sum_{r=1}^{R_2} \xi^r(v_{i r})+ b_{i 2} + a \, S_{i}, \label{eq:jointmeanmodel} \end{eqnarray*}\]

# read data 
merg.data <- readRDS("Data/MAP_data.RDS")

# The map 
map <- ne_countries(returnclass = "sf")

# Extract the map and compile the shape
if (T){
names(map)[names(map) == "iso_a3"] <- "ISO3"
names(map)[names(map) == "name"] <- "NAME"
map <- map[order(map$name_long),]
map$name_long[grepl("Gambia", map$name_long)] <- "Gambia"
map$name_long[map$name_long == "Republic of the Congo"] <- "Congo"

map <- map[order(map$name_long),]

# Select only the common countries from the map
sub_map <- match(map$name_long , merg.data$Common.countries)
map$cc <- sub_map
map_2 <- subset(map,map$cc!="NA" )
map_2 <- map_2[order(map_2$name_long),]

map_2$D.E = merg.data$D.E
map_2$M.E = merg.data$M.E
map_2$M.cases = merg.data$M.cases
map_2$D.cases = merg.data$D.cases
map_2$M.pop = merg.data$M.pop
map_2$D.pop = merg.data$D.pop


library(cleangeo)
rr <- clgeo_CollectionReport(map_2)
summary(rr)
issues <- rr[rr$type == NA,]
map_2_c <- clgeo_Clean(map_2)
}

# Neighbourhood graph
nb <- poly2nb(map_2_c)
## although coordinates are longitude/latitude, st_intersects assumes that they
## are planar
nb2INLA("Day 2/map.adj",nb = nb)
g <- inla.read.graph(filename = "Day 2/map.adj")
map_2$ID = 1:length(map_2_c$level)

par(mar = c(0,0,0,0))
plot(map$geometry[map$continent == "Africa"], border = "grey")
plot(map_2_c, add = TRUE)
text(coordinates(map_2_c)[,1], coordinates(map_2_c)[,2], label = map_2$ID)
plot(nb, coords = map_2_c,  add = TRUE, col = "blue", pch = 1, cex = 0.01)

#Edit the graph to manually connect 12 and 14 & 15 and 8
if (F){
ind = c(12,14,15,8)
g1 = g
g1$nnbs[ind]=g1$nnbs[ind] + 1
g1$nbs[[12]] = c(g1$nbs[[12]], 14)
g1$nbs[[14]] = c(g1$nbs[[14]], 12)
g1$nbs[[15]] = c(g1$nbs[[15]], 8)
g1$nbs[[8]] = c(g1$nbs[[8]], 15)
g1$cc$id = rep(1, g1$n)
g1$cc$n = 1
g1$cc$nodes = 1:g1$n
g1$cc$mean = rep(1, g1$n)
  
plot(g)
dev.off()
plot(g1)
}
#OR edit the neighbourhood
if (F){
  nb1 <- nb
  nb1[[12]] <- as.integer(c(14))
  nb1[[14]] <- as.integer(c(nb[[14]], 12), length = length(nb[[14]])+1)
  nb1[[8]] <- as.integer(c(nb[[8]], 15), length = length(nb[[8]])+1)
  nb1[[15]] <- as.integer(c(nb[[15]], 8), length = length(nb[[15]])+1)

  par(mar = c(0,0,0,0))
plot(map$geometry[map$continent == "Africa"], border = "grey")
plot(map_2_c, add = TRUE)
text(coordinates(map_2_c)[,1], coordinates(map_2_c)[,2], label = map_2_c$ID)
plot(nb1, coords = map_2_c,  add = TRUE, col = "blue", pch = 1, cex = 0.01)

nb2INLA("Day 2/map1.adj",nb = nb1)
g1_alt <- inla.read.graph(filename = "Day 2/map1.adj")

plot(g1)
dev.off()
plot(g1_alt)
  }

Now we can model Malaria and G6PD mean incidence jointly by sharing a common spatial field using besag2 (see https://doi.org/10.1098/rsos.230851).

b = length(merg.data$M.cases)
y1 = c(merg.data$M.cases , rep(NA,b))
y2 = c(rep(NA,b) , merg.data$D.cases)

E1 = c(merg.data$M.E,rep(NA,b))
E2 = c(rep(NA,b) , merg.data$D.E)

yy = cbind(y1,y2)
EE = cbind(E1, E2)
mu = c(rep(1, b) , rep(2,b))
b1 = c(1:b,rep(NA ,b))
b2 = c(rep(NA,b),1:b)

ID = c(1:b , rep(NA ,b))

ID_copy = c(rep(NA , b), 1:b + b)

m = as.factor(mu)
d = data.frame(yy, m, ID, ID_copy, b1, b2)  

formula = yy ~ -1 + m + offset(log(E1))+ offset(log(E2))+
  f(ID, model = "besag2", graph=g, scale.model = T)+
  f(ID_copy, copy="ID", hyper = list(beta = list(fixed = TRUE)))+
  f(b1 , model = "bym2", graph=g, scale.model=TRUE)+
  f(b2, model = "bym2", graph=g, scale.model=TRUE)

r1 <- inla(formula,
           family = c("poisson","poisson"),
           data = d,
           verbose = F,
           control.predictor = list(compute = T))

round(r1$summary.fixed[,c(1:3,5)],3)
##     mean    sd 0.025quant 0.975quant
## m1 7.882 0.316      7.257      8.509
## m2 4.155 0.207      3.741      4.559
round(r1$summary.hyperpar[,c(1,3,5,6)],3)
##                              mean 0.025quant 0.975quant      mode
## Precision for ID        31641.961   3964.354  98862.855 11522.015
## Scale paramter a for ID     1.063      0.573      1.791     0.945
## Precision for b1            0.410      0.231      0.661     0.374
## Phi for b1                  0.184      0.013      0.565     0.036
## Precision for b2            1.045      0.556      1.803     0.912
## Phi for b2                  0.188      0.018      0.552     0.055

We can plot the relative risk from the model.

#Fitted values
map_2$fit_besag2_m <- r1$summary.fitted.values[1:b,"mean"]/map_2$M.E
map_2$fit_besag2_g <- r1$summary.fitted.values[1:b+b,"mean"]/map_2$D.E

l <- leaflet(map_2) %>% addTiles()

pal <- colorNumeric(palette = "YlOrRd", 
                    domain = seq(min(map_2$fit_besag2_m)-0.5,max(map_2$fit_besag2_m)+0.5 , by = 0.1))

l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$fit_besag2_m), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values= ~seq(min(map_2$fit_besag2_m)-0.5,max(map_2$fit_besag2_m)+0.5 , by = 0.1), 
            opacity = 0.5, title = "RR from besag2 model for malaria",
            position = "topright")
pal <- colorNumeric(palette = "YlOrRd", 
                    domain = seq(min(map_2$fit_besag2_g)-0.5,max(map_2$fit_besag2_g)+0.5 , by = 0.1))


l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$fit_besag2_g), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~seq(min(map_2$fit_besag2_g)-0.5,max(map_2$fit_besag2_g)+0.5 , by = 0.1), 
            opacity = 0.5, title = "RR from besag2 model for G6PD",
            position = "topright")
#Common spatial field
map_2$besag_field <- r1$summary.random$ID$mean[1:b]*r1$summary.hyperpar["Scale paramter a for ID", "mean"]

pal <- colorNumeric(palette = "YlOrRd", 
                    domain = seq(min(map_2$besag_field)-0.5,max(map_2$besag_field)+0.5 , by = 0.1))

l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$besag_field), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~seq(min(map_2$besag_field)-0.5,max(map_2$besag_field)+0.5 , by = 0.1), 
            opacity = 0.5, title = "Common spatial field",
            position = "topright")
#Disease-specific BYM effects
map_2$bym2_m <- r1$summary.random$b1$mean[1:b] 
map_2$bym2_g <- r1$summary.random$b2$mean[1:b] 


pal <- colorNumeric(palette = "YlOrRd", 
                    domain = seq(min(map_2$bym2_m)-0.5,max(map_2$bym2_m)+0.5 , by = 0.1))

l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$bym2_m), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~seq(min(map_2$bym2_m)-0.5,max(map_2$bym2_m)+0.5 , by = 0.1), 
            opacity = 0.5, title = "BYM values for malaria",
            position = "topright")
pal <- colorNumeric(palette = "YlOrRd", 
                    domain = seq(min(map_2$bym2_g)-0.5,max(map_2$bym2_g)+0.5 , by = 0.1))


l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$bym2_g), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~seq(min(map_2$bym2_g)-0.5,max(map_2$bym2_g)+0.5 , by = 0.1), 
            opacity = 0.5, title = "BYM values for G6PD",
            position = "topright")
#Disease-specific Besag effects
map_2$bym2_m <- r1$summary.random$b1$mean[b+1:b] 
map_2$bym2_g <- r1$summary.random$b2$mean[b+1:b] 

pal <- colorNumeric(palette = "YlOrRd", 
                    domain = seq(min(map_2$bym2_m)-0.5,max(map_2$bym2_m)+0.5 , by = 0.1))

l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$bym2_m), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~seq(min(map_2$bym2_m)-0.5,max(map_2$bym2_m)+0.5 , by = 0.1), 
            opacity = 0.5, title = "Besag values for malaria",
            position = "topright")

pal <- colorNumeric(palette = "YlOrRd", 
                    domain = seq(min(map_2$bym2_g)-0.5,max(map_2$bym2_g)+0.5 , by = 0.1))


l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$bym2_g), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~seq(min(map_2$bym2_g)-0.5,max(map_2$bym2_g)+0.5 , by = 0.1), 
            opacity = 0.5, title = "Besag values for G6PD",
            position = "topright")

# the iid for malaria
map_2$vm = (r1$summary.random$b1$mean[1:21] * sqrt(r1$summary.hyperpar$mode[3]) -  
             r1$summary.random$b1$mean[22:42] * sqrt(r1$summary.hyperpar$mode[4])) / 
  sqrt(1 - r1$summary.hyperpar$mode[4])

range(map_2$vm)

pal <- colorNumeric(palette = "YlOrRd", domain = seq(min(map_2$vm)-0.2, max(map_2$vm) + 0.5 , by = 0.1))

l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$vm), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~seq(min(map_2$vm)-0.2, max(map_2$vm) + 0.5 , by = 0.1), 
            opacity = 0.5, title = "IID of Malaria",
            position = "topright")


# iid effect for g6pd
map_2$v = (r1$summary.random$b2$mean[1:21] * sqrt(r1$summary.hyperpar$mode[5]) -  
              r1$summary.random$b2$mean[22:42] * sqrt(r1$summary.hyperpar$mode[6])) / 
              sqrt(1 - r1$summary.hyperpar$mode[6])

pal <- colorNumeric(palette = "YlOrRd", domain = seq(min(map_2$v)-0.2, max(map_2$v) + 0.5 , by = 0.1))


l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$v), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~seq(min(map_2$v)-0.2, max(map_2$v) + 0.5 , by = 0.1), 
            opacity = 0.5, title = "IID for G6PD",
            position = "topright")

Example 4 - Quantile models for disease mapping

Now we can also do quantile modeling, instead of the mean risk we can model the high risk areas only and see if there are different covariate impacts for the higher levels of risk.

\[\begin{eqnarray} y_{i 1}|\lambda_{i1} &\sim& \text { Poisson }\left(E_{i1}\lambda_{i 1}\right) \notag\\ y_{i 2}|\lambda_{i2} &\sim& \text { Poisson }\left(E_{i2}\lambda_{i 2}\right) \notag\\ \log (q_{i 1, \alpha_{1}}) &=& \eta_{i 1,\alpha_{1}} = m_{1} + \sum_{f = 1}^{F_1} \beta_f X_{i f} + \sum_{r=1}^{R_1} \rho^r(u_{i r}) + b_{i 1} + S_{i} \notag \\ \log (q_{i 2,\alpha_{2}}) &=& \eta_{i 2,\alpha_{2}} = m_{2} + \sum_{f = 1}^{F_2} \gamma_f Z_{i f} + \sum_{r=1}^{R_2} \xi^r(v_{i r})+ b_{i 2} + c \, S_{i},\label{qa2} \end{eqnarray}\]. In INLA we only need to add the special link function as follows:
control.family = list(control.link = list(model = “quantile”, quantile = alpha)).

First we model the malaria incidence quantile and thereafter the G6PD incidence, while ignoring a shared effect.

alpha0 = 0.8
r.m <- inla(formula = map_2$M.cases ~ 1 + offset(log(map_2$M.E)) +
              f(b1, model = "bym2", scale.model = T,
                graph = g)   ,
            family = "poisson",
            data =  merg.data,
            control.predictor = list(compute = T),
            control.family = list(control.link = list(model = "quantile",
                                                      quantile = alpha0)),
            verbose = F)

round(r.m$summary.hyperpar[,c(1,3,5,6)],3)
##                   mean 0.025quant 0.975quant  mode
## Precision for b1 2.045      1.072      3.480 1.818
## Phi for b1       0.338      0.076      0.695 0.255
# the quantile for G6PD only
r.d <- inla(formula = map_2$D.cases ~ 1 + offset(log(map_2$D.E))+
              f(b2,model = "bym2", scale.model = T,
                graph = g)  ,
            family = "poisson",
            data =  merg.data,
            control.predictor = list(compute = T),
            control.family = list(control.link = list(model = "quantile",
                                                      quantile = alpha0)),
            verbose = F)

round(r.d$summary.hyperpar[,c(1,3,5,6)],3)
##                   mean 0.025quant 0.975quant  mode
## Precision for b2 2.946      1.426      5.328 2.521
## Phi for b2       0.281      0.051      0.656 0.163

Visualize the effects.

# the iid EFFECT malaria
map_2$vm2 = (r.m$summary.random$b1$mean[1:21] * sqrt(r.m$summary.hyperpar$mode[1]) -  
              r.m$summary.random$b1$mean[22:42] * sqrt(r.m$summary.hyperpar$mode[2])) / 
  sqrt(1 - r.m$summary.hyperpar$mode[2])

pal <- colorNumeric(palette = "YlOrRd", domain = seq(-2.6,1.6 , by = 0.1))


l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$vm2), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~seq(-2.6,1.6 , by = 0.1), 
            opacity = 0.5, title = "IID effect for G6PD",
            position = "topright")
# the iid EFFECT g6

map_2$vd2 = (r.d$summary.random$b2$mean[1:21] * sqrt(r.d$summary.hyperpar$mode[1]) -  
              r.d$summary.random$b2$mean[22:42] * sqrt(r.d$summary.hyperpar$mode[2])) / 
  sqrt(1 - r.d$summary.hyperpar$mode[2])

pal <- colorNumeric(palette = "YlOrRd", domain = seq(-5.6,2.6 , by = 0.1))


l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$vd2), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~seq(-5.6,2.6 , by = 0.1), 
            opacity = 0.5, title = "IID effect for G6PD",
            position = "topright")
# the spatial effect for malaria

map_2$u.m2 = r.m$summary.random$b1$mean[22:42]

pal <- colorNumeric(palette = "YlOrRd", domain = seq(-5.6,2.6 , by = 0.1))


l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$u.m2), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~seq(-5.6,2.6 , by = 0.1), 
            opacity = 0.5, title = "Besag effect for Malaria",
            position = "topright")
# the spatial effect for G6PD

map_2$u.d2 = r.d$summary.random$b2$mean[22:42]


pal <- colorNumeric(palette = "YlOrRd", domain =seq(-5.6,2.6 , by = 0.1))


l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$u.d2), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~ seq(-5.6,2.6 , by = 0.1), 
            opacity = 0.5, title = "Besag effect for G6PD",
            position = "topright")
# The relative risk for malaria 

map_2$rrm = r.m$summary.fitted.values[,"mean"]/map_2$M.E
range(map_2$rrm)
## [1] 0.2501084 4.0811419
pal <- colorNumeric(palette = "YlOrRd", 
                    domain = seq(0,6.3 , by = 0.1))
l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$rrm), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~seq(0,6.3 , by = 0.1), 
            opacity = 0.5, title = "RR",
            position = "topright")
# The relative risk for G6PD

map_2$rrd = r.d$summary.fitted.values[,"mean"]/map_2$D.E
range(map_2$rrd)
## [1] 0.3225694 5.0984641
pal <- colorNumeric(palette = "YlOrRd", domain =seq(0, 6.3, length = 9))


l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$rrd), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~ seq(0, 6.3, length = 9), 
            opacity = 0.5, title = "RR",
            position = "topright")
# The predicted cases for Malaria 

map_2$Predicted.MM = r.m$summary.fitted.values$mean

pal <- colorNumeric(palette = "YlOrRd", domain =seq(0,40000, length =9))


l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$Predicted.MM), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~ seq(0,40000, length =9), 
            opacity = 0.5, title = "Predicted",
            position = "topright")
#Observed

l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$M.cases), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~ seq(0,40000, length =9), 
            opacity = 0.5, title = "Observed",
            position = "topright")
# The predicted cases for G6PD


map_2$Predicted.DD = r.d$summary.fitted.values$mean
range(map_2$Predicted.DD)
## [1]   6.111768 423.076806
pal <- colorNumeric(palette = "YlOrRd", domain =seq(0,500 , by = 10))


l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$Predicted.DD), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~ seq(0,500 , by = 10), 
            opacity = 0.5, title = "Predicted",
            position = "topright")
l %>% addPolygons(color = "grey", weight = 1, 
                  fillColor = ~pal(map_2$D.cases), 
                  fillOpacity = 0.5) %>%
  addLegend(pal = pal, values = ~ seq(0,500 , by = 10), 
            opacity = 0.5, title = "Observed",
            position = "topright")

Now we can do a multivariate quantile disease mapping of these two diseases.

#Use the dataset and formula we already created for the joint mean modeling

formula = yy ~ -1 + m + offset(log(E1))+ offset(log(E2))+
  f(ID, model = "besag2", graph=g, scale.model = T)+
  f(ID_copy, copy="ID", hyper = list(beta = list(fixed = TRUE)))+
  f(b1 , model = "bym2", graph=g, scale.model=TRUE)+
  f(b2, model = "bym2", graph=g, scale.model=TRUE)

alpha0 = 0.7
r1 <- inla(formula,
           family = c("poisson","poisson"),
           
           control.family = list(list(control.link = list(model = "quantile",quantile = alpha0)),
                                 
                                 list(control.link = list(model = "quantile",
                                                          quantile = alpha0))), 
           data = d,
           verbose = F,
           control.compute = list(dic = T, waic = T, cpo = T),
           control.predictor = list(compute = T))


round(r1$summary.fixed[,c(1:3,5)],3)
##     mean    sd 0.025quant 0.975quant
## m1 7.897 0.315      7.274      8.522
## m2 4.211 0.202      3.805      4.607
round(r1$summary.hyperpar[,c(1,3,5,6)],3)
##                              mean 0.025quant 0.975quant      mode
## Precision for ID        31958.097   3969.825 103145.631 11379.477
## Scale paramter a for ID     1.068      0.578      1.829     0.933
## Precision for b1            0.420      0.236      0.671     0.387
## Phi for b1                  0.184      0.014      0.593     0.039
## Precision for b2            1.091      0.579      1.868     0.959
## Phi for b2                  0.188      0.019      0.557     0.056
#Opposite quantiles
r1a <- inla(formula,
           family = c("poisson","poisson"),
           control.family = list(list(control.link = list(model = "quantile",quantile = alpha0)),
                                 list(control.link = list(model = "quantile",
                                                          quantile = 1-alpha0))),
           data = d,
           verbose = F,
           control.compute = list(dic = T, waic = T, cpo = T),
           control.predictor = list(compute = T))


round(r1a$summary.fixed[,c(1:3,5)],3)
##     mean    sd 0.025quant 0.975quant
## m1 7.899 0.192      7.443      8.349
## m2 4.045 0.208      3.628      4.451
round(r1a$summary.hyperpar[,c(1,3,5,6)],3)
##                          mean 0.025quant 0.975quant  mode
## Precision for ID        0.640      0.214      1.510 0.440
## Scale paramter a for ID 1.629      1.150      2.241 1.560
## Precision for b1        6.380      0.189     36.103 0.442
## Phi for b1              0.227      0.009      0.702 0.018
## Precision for b2        1.079      0.546      1.902 0.936
## Phi for b2              0.160      0.011      0.562 0.029

Maybe the besag2 is too strict because of the restriction a>0. We can add more flexibility with copying a besagproper model component as follows:

#High quantile of malaria and low quantile of G6PD
alpha0 = 0.7
d1 <- d
d1$ID_copy = c(rep(NA, b),1:b)

formula1 = yy ~ -1 + m + offset(log(E1))+ offset(log(E2))+
  f(ID, model = "besagproper", graph=g) +
  f(ID_copy, copy="ID", hyper = list(beta = list(fixed = FALSE))) +
  f(b1 , model = "bym2", graph=g, scale.model=TRUE)+
  f(b2, model = "bym2", graph=g, scale.model=TRUE)

r2a <- inla(formula1,
           family = c("poisson","poisson"),
           control.family = list(list(control.link = list(model = "quantile",quantile = alpha0)),
                                 list(control.link = list(model = "quantile",
                                                          quantile = 1-alpha0))),
           data = d1,
           verbose = F,
           control.compute = list(dic = T, waic = T, cpo = T),
           control.predictor = list(compute = T))


round(r2a$summary.fixed[,c(1:3,5)],3)
##     mean    sd 0.025quant 0.975quant
## m1 7.892 0.404      7.088      8.696
## m2 4.044 0.338      3.358      4.712
round(r2a$summary.hyperpar[,c(1,3,5,6)],3)
##                   mean 0.025quant 0.975quant  mode
## Precision for ID 2.392      0.090     14.060 0.218
## Diagonal for ID  1.358      0.176      4.598 0.487
## Precision for b1 0.544      0.273      0.967 0.470
## Phi for b1       0.183      0.012      0.638 0.031
## Precision for b2 2.312      0.254      8.590 0.695
## Phi for b2       0.209      0.014      0.675 0.038
## Beta for ID_copy 1.077      0.436      1.709 1.086
#Median joint modeling

alpha0 = 0.5

formula1 = yy ~ -1 + m + offset(log(E1))+ offset(log(E2))+
  f(ID, model = "besagproper", graph=g) +
  f(ID_copy, copy="ID", hyper = list(beta = list(fixed = FALSE))) +
  f(b1 , model = "bym2", graph=g, scale.model=TRUE)+
  f(b2, model = "bym2", graph=g, scale.model=TRUE)

r3a <- inla(formula1,
           family = c("poisson","poisson"),
           control.family = list(list(control.link = list(model = "quantile",quantile = alpha0)),
                                 list(control.link = list(model = "quantile",
                                                          quantile = 1-alpha0))),
           data = d1,
           verbose = F,
           control.compute = list(dic = T, waic = T, cpo = T),
           control.predictor = list(compute = T))


round(r3a$summary.fixed[,c(1:3,5)],3)
##     mean    sd 0.025quant 0.975quant
## m1 7.877 0.373      7.132      8.621
## m2 4.129 0.300      3.517      4.718
round(r3a$summary.hyperpar[,c(1,3,5,6)],3)
##                    mean 0.025quant 0.975quant  mode
## Precision for ID 13.045      0.067     87.991 0.127
## Diagonal for ID   1.325      0.136      4.896 0.378
## Precision for b1  0.509      0.247      0.902 0.444
## Phi for b1        0.187      0.012      0.646 0.033
## Precision for b2  2.285      0.142      9.531 0.382
## Phi for b2        0.207      0.014      0.675 0.036
## Beta for ID_copy  1.096      0.473      1.732 1.085